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Abstract 


To achieve a reliable and persistent expression, the transgene should be precisely integrated into the genome 
safe harbor (GSH) loci. Little attention has been paid to find the safe harbor loci of the chicken (Gallus gallus 
domesticus) genome. Identification and characterization of GSH loci that allow the persistent and reliable expression 
of knock-in genes could be a major area of interest within the field of transgenic technology and is central to the 
development of transgenic livestock. Randomly integrated transgenes might encounter position effects and epigenetic 
silencing, so unstable phenotypes, as well as unreliable and unpredictable expression of the knock-in transgene could 
occur. In contrast to random gene insertion, site-specific gene targeting provides a superior strategy that exploits 
homologous recombination to insert a transgene of interest into a pre-determined locus. In this study, based on 
bioinformatics, gene expression atlas, and Hi-C analyses, the GSH region was predicted in the chicken genome 
between DRG/ and EIF4ENIF1 genes. To do so, we introduce a fast and easy-to-use pipeline that allows the 
prediction of orthologue GSH loci in all organisms, especially chickens. In addition, the procedure to design 
Cas9/gRNA expression and targeting vectors for targeting these predicted GSH regions is described in detail. 


Keywords: Genome safe harbor loci, Genetically engineered birds, Transgenic chicken, CRISPR/Cas9, Gene 
expression atlas, Hi-C map. 


expression of an endogenous gene (Stanford et al., 
2001). Then, the integration sites are evaluated to 
find the regions with the highest expressions 
(Papapetrou et al., 2011). Such a reverse screening 
strategy is very laborious, as numerous sites which 
are subjected to reporter insertion should be 


Introduction 


It has become increasingly important to 
determine regions that support the integration and 
long-term expression of a transgene in the genome. 
Considerable efforts have been underway to 


elucidate genomic safe harbor (GSH) loci that could 
potentially support long-term expressions within the 
field of transgenesis and recombinant protein 
production (Sadelain et al., 2011). The discovery of 
a GSH locus that allows reliable and consistent 
expression of a knock-in gene without triggering the 
functional disruption of internal genes is of utmost 
importance to develop bioreactors (Papapetrou et al., 
2011; Ruan et al., 2015). 

There is a growing number of strategies to screen 
and identify GSH loci. Traditional strategies are 
expensive, cumbersome, and time-consuming. For 
example, to identify and explore potential GSH loci, 
the “gene trapping” method has been used that relies 
on random integration of a promoterless reporter 
construct across the genome to indicate the 
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analyzed. Another method to find the potential GSH 
loci is based on whole transcriptome sequencing, 
which could be expensive and needs specialized 
analyses (Ma et al., 2018). The in vivo imaging 
(Rizzi et al., 2017) of reporter animals which could 
be performed to find suitable loci for transgene 
integration, is also time-consuming and 
uneconomical. More recently, a systematic approach 
that combines the RNA-seq data with the High- 
throughput Chromosome Conformation 
Capture (Hi-C) data was proposed to predict the 
GSH regions (Hilliard and Lee, 2021). This 
approach is more informative and applicable, but 
the high cost is the main constraint against its 
universal use. However, it is helpful and cost- 
effective to do data-mining and similarity-finding 
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experiments on the data adopted from validated GSH 
loci databases to identify potential orthologous GSH 
regions. Hence, it would be easy to predict actively 
transcribed regions, transcriptionally permissive 
topologically associating domains (TADs), and 
nucleosome-poor regions (Fishman et al., 2019; 
Hilliard and Lee, 2021; Zhao et al., 2019). A 
comparative genomics approach could also be 
applied to screen and_ detect — similar 
sites/homologous sequences among different species 
(Irion et al., 2007; Li et al., 2014; Wu et al., 2016; 
Yang et al., 2016b). 

Here, we introduce a novel, cost-effective, and 
easy-to-use pipeline to predict the potential GSH loci 
in the eukaryotic genome. We applied this pipeline 
to analyze the gene expression atlas, and Hi-C data 
to identify a potential GSH locus in the chicken 
genome. Then, we describe the procedure for 
designing CRISPR-based targeting and Cas9/gRNA 
expression vectors that are applicable for targeting 
these predicted GSH regions. 


Materials and Methods 


Pipeline for prediction of GSH loci 

We used a five-step, fast, and easy-to-use 
pipeline to predict an orthologous GSH locus in the 
Gallus gallus domesticus genome (Figure 1). In the 
first step, genes around the validated GSH locus 
were found based on their sequence similarities in 
the chicken genome. The validated locus, the 
intergenic sequence between DRG/ and EIF4ENIF 1 
genes, was previously reported as a potential GSH 
locus in Sus scrofa, Mus musculus, and Homo 
sapiens. To this end, gene similarities around the 
predicted GSH locus were compared to the genes 
around the validated GSH locus using the NCBI 
Genome Data Viewer (GDV) browser 
(https://www.ncbi.nlm.nih.gov/genome/gdv/). 
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In the second step, the intergenic sequence between 
DRGI and EIF4ENIF1 genes from Sus scrofa, Mus 
musculus, and Homo sapiens was used as a template 
to perform a pairwise alignment (EMBOSS Water 
algorithm) against the similar intergenic sequence of 
the Gallus gallus domesticus (taxid: 9031) genome. 
In the third step, the predicted GSH locus was 
evaluated for the presence of possible annotated 
coding or non-coding genes using the NCBI GDV 
(Gallus gallus genome assembly GRCg6a) and the 
UCSC Genome Browser 
(https://genome.ucsc.edu/cgi-bin/hgGateway; 
chicken assembly GRCg6a/galGal6; Mar. 2018). In 
the fourth step, benefiting from chicken RNAseq 
data, the expression levels (transcript per millions; 
TPM) of the genes flanking the intergenic locus of 
interest were determined using the Gene Expression 
Atlas (https://www.ebi.ac.uk/gxa/home). In the fifth 
step, the chicken Hi-C data were used to predict the 
coordination of the GSH locus with its adjacent 
genes using the Hi-C map (Supplementary file 1). 
For visualizing the Hi-C map, the Juicebox software, 
version 1.9.0 
(https://github.com/aidenlab/Juicebox/wiki/Downlo 
ad) was used to find the location of interest 
(coordinate system of the map corresponds to the 
genome version GalGal5). Then, the gene 
coordinates were compared with the locations of 
TAD boundaries. The map with the gene tracks is 
available in the following URL 
(http://sites.icgbio.ru/ontogen/wp- 
content/uploads/MolMechDevDepart/GCF_000002 
315.4_Gallus_gallus-5.0_genomic.gff.genes.bed). 
Alternatively, defined TAD boundaries were 
computationally adopted from the following URL 
(http://sites.icgbio.ru/ontogen/wp- 
content/uploads/MolMechDevDepart/subTADs- 
ChEF-all-HindIII- 
40k.hm.gzipped_matrix.jucebox_domains.annotatio 


n). 


Journal of Cell and Molecular Research (2021) 13 (1), 1-9 


& 


NCBI 
genome data viewer 


UCSC 
genome data viewer 


Genome Data Viewer 


ice 


Pairwise alignment EMBOSS Water 


nN 
~— 


d) 


gene expression atlas 


wv) 
~~ 


Fishman, 2019 


Gallus gallus domesticus 


evaluation for the similarity 
of neighborhood genes around the predicted GSH locus 


Homo sapiens 


7 nef rnu6 28, 
pisd eif4enifi 4 POE ip? atzt digi aes ip ae on 
‘Shit gi ._ eifaenifi 
finco1s21» — 'nu6-338p AP 


Homo sapiens (GRCh38.P13) 


ao 


evaluation for the presence of 
possible homology 


£\ 


A é 
cit i) 
and? BN be 


Ch.22: NC_000022.11 


Q 
fe} 


Ch.11: NC_000077.7 


Mus musculus (GRCm39) 


Chicken Genome Safe Harbor Regions (Dehdilani et al.) 


Sus scrofa 


ar 


Mus musculus 


og! 70g ag] 
¢fau-ps2 Satz? foc110250708 rg 


isd 


» 
— ‘oni loc 10256700 eif4enifl 
eif4enifi gn 12592 10¢ atz1 
4gn51877 gn11944 “ 
pik3ip1 


Sus scrofa (Sscrofa11.1) 


Q 
Q 


Ch.14: NC_010456.5 


Genes, NCBI Gallus gallus Annotation Release 105, 2021-04-11 
| roe i eed 


40.9 % 
jos Qa joe, 
2 2 al es 
— 5S <¢ — 5 <¢ — 5 <¢ 
“f\ - & K\ - & is - & 
wv oy ny eo x eo 
or? 7 ok® > nk® ae 
Ch.15: NC_006102.5 Gallus gallus domesticus (GRCg6a) 
> 


evaluation to find the possible 
coding or non-coding genes " 


chr 15: 9,255,000 


SFT1 aa | 
| EIF4ENIIF1 ——=<$ 
Scale; 20 kb 
evaluation of transcription status > 
around the predicted GSH locus “\ 
be 
a 
e® 


oe 
a 
go dM ye coh vet 09 yd geresl™ 


Merkin et al, 2012 


Unit: TPM 13.27 4 #18 #10 42 5 52 31 
Barbosa-Morais et al2012 il oO ZH EJ Bw ie Oo 
Unit: TPM 16 NA 9 8 7 NA 8 NA NA 

9,046,582 < --- --- | ee cece > 9,047,683 

1101bp: 
evaluation of subTAD’‘s coordinates ' ' 
Sthstep | around the predicted GSH locus ee ee K\ oo 2io2,183 
. cne™ Ro > 
A) 
ee 0? oe 
Oy 9? or 
o& os” 9°" 


http://jcmr.um.ac.ir 


Journal of Cell and Molecular Research (2021) 13 (1), 1-9 


Chicken Genome Safe Harbor Regions (Dehdilani et al.) 


Figure 1. A schematic depiction for deciphering a genome safe harbor locus in the chicken genome. a) Comparison 
of gene distribution around the validated GSH locus in human (Homo sapiens), mouse (Mus musculus), and pig (Sus 
scrofa) genome with the same region in the chicken genome using NCBI Genome Data Viewer; b) Pairwise alignment 
of the predicted chicken GSH locus and validated GSH loci in human, mouse, and pig genomes by EMBOSS Water 
algorithm for calculation of sequence identity; c) Screening for possible coding or non-coding genes in the predicted 
chicken GSH locus by NCBI and UCSC genome data viewers; d) Expression levels of DRG/ and EIF4ENIF1 in 
several tissues adopted from gene expression atlas; e) Coordinates of chicken subTADs around the predicted chicken 
GSH locus. Abbreviations: ch/chr, chromosome; bp, base pair; TPM, transcripts per million; NA, not applicable. 


Designing Cas9/gRNA expression and CRISPR- 
based targeting vectors 

To design a highly specific gRNA for the 
predicted chicken GSH locus, the intergenic 
sequence between DRG/ and EIF4ENIFI1 genes 
was subjected to the CHOPCHOP search engine 
(https://chopchop.cbu.uib.no/). The predicted gRNA 
expressing sequence with high specificity to the 
predicted GSH locus was selected and synthesized 
(Macrogen, South Korea) as 20 bp forward (P1, 
Table 1) and reverse (P2) oligonucleotides with 
appropriate overhangs (cacc, for the forward 
oligonucleotide; and aaac, for the reverse 
oligonucleotide) to be cloned into the BbsI site in the 
gRNA expression vector (pSpCas9(BB)-2A-Puro 
(PX459) V2.0; Plasmid #62988; Addgene, USA). The 
annealed gRNA oligonucleotide was phosphorylated 
by T4 polynucleotide kinase (PNK; Thermofischer, 
EKO0031) for 30 min at 37 °C followed by a 30 min 
inactivation at 70 °C. To calculate the insert: vector 
molar ratio, the annealed oligonucleotides were run 
on a 2% agarose gel, and their integrated density 
index was determined using the ImageJ software 
(https://imagej.nih.gov/ij/download.html) . The 
gRNA expression vector was subjected to BbsI 
enzymatic digestion for 1h, and the subsequent heat- 
inactivation for 30 min at 70 °C. Then, it was 
dephosphorylated with fast alkaline phosphatase 
(Thermofischer, EF0654) for 10 min at 37 °C, and 
heat-inactivated for 30 min at 70 °C. Using T4 DNA 
ligase (Thermofischer, EL0011), the phosphorylated 
gRNA was ligated to the digested and 
dephosphorylated gRNA expression vector for 3h at 
16 °C which was followed by 12h at 4 °C. 

To design a CRISPR/Cas9-based targeting 
vector, a plasmid containing CMV-PAC‘-IRES- 
EGFP cassette (Figure 2b; constructed in this 
laboratory) was used. This plasmid contains Pvul- 
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Xhol sites upstream of CMV, and Nhel-XcmI sites 
downstream of EGFP. Left and right homology arms 
(LHA and RHA, respectively) with approximately 
500-bp length were amplified from the chicken 
genomic DNA by appropriate primers, including 
Pyul cut site at the 5'-end of LHA (P3), XhoI cut site 
at the 3’-end of LHA (P4), NheI cut site at the 5'-end 
of RHA (P5), and XcmI cut site at the 3'-end of RHA 
(P6). In two steps, arms were cloned into the 
targeting vector. At the first step, the EGFP plasmid 
was cut with Nhel-Xcml for 3h at 37 °C, followed by 
dephosphorylation with fast alkaline phosphatase for 
10 min at 37 °C, and heat-inactivation for 30 min at 
70 °C. The amplified RHA was cut with NhelI and 
Xcml for 3h at 37 °C. A 1:3 vector to insert molar 
ratio was used to ligate the amplified RHA into the 
EGFP vector by the T4 DNA ligase. The generated 
vector was called the pre-targeting vector. At the 
second step, the pre-targeting vector was cut with 
Pyul and Xhol for 3h at 37 °C, followed by 
dephosphorylation with fast alkaline phosphatase for 
10 min at 37 °C, and heat-inactivation for 30 min at 
70 °C. The amplified LHA was cut with the Pvul and 
Xhol for 3h at 37 °C. A 1:3 vector to insert molar 
ratio was used to ligate the amplified LHA into the 
pre-targeting vector by the T4 ligase. The generated 
vector was called the CRISPR/Cas9-based targeting 
vector. 

For all cloning procedures, 5ul of the ligation mix 
was transformed into E. coli DH5a followed by 
overnight incubation at 37 °C. Colony PCR was 
performed on transformants by vector-specific, and 
insert-specific primers (Figure 2), and positive 
clones were grown for plasmid extraction using the 
Plasmid DNA Isolation Kit (DENAzist Asia, Iran). 
Cloning verification was performed using restriction 
enzyme digestion (Figure 2). 
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Figure 2. The proposed standard strategy for constructing the GSH targeting vector. a) A specific gRNA sequence in 
the predicted chicken GSH locus and the corresponding Cas9 cut site (red arrow) are depicted. P3 and P4 primers 
containing restriction enzyme sites for Pvul and Xhol, as well as P5 and P6 primers containing restriction enzyme sites 
for Nhel and XcmI were designed and synthesized. b) A 544-bp PCR amplified RHA was cut with Nhel/Xcml. Then, 
the 516-bp RHA was cloned into the EGFP vector (7469-bp) to generate a pre-targeting vector (8012-bp). c) Cloning 
verification of RHA was performed by NhelI/XcmlI digestion. The 516-bp RHA band and 7469-bp vector backbone 
were detected on the agarose gel. d) A 541-bp PCR amplified LHA was cut with Pvul/XholI. Then, the 503-bp LHA 
was cloned into the pre-targeting vector (7489-bp) to generate a CRISPR-based targeting vector (7992-bp). Verification 
of the LHA cloning was performed by Pvul/XholI. The 503-bp LHA band and 7489-bp vector backbone were detected 
on the agarose gel. e) gRNA oligonucleotides (P1 and P2) containing BbsI overhangs were synthesized, annealed, and 
phosphorylated. f) gRNA oligonucleotides containing BbsI overhangs were cloned into the BbsI-digested Cas9 vector. 
g) Colony PCR was performed to verify the cloning of gRNA using P1 and P7 primers, and a 224-bp band was detected 
on the agarose gel. Abbreviations: LHA, left homology arm; RHA, right homology arm; SM, size marker; NTC, non- 
template control; bp, base pairs; gRNA, guide RNA; P, primer. 


Results 


Gene similarities around the validated GSH loci 
and predicted GSH locus 
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In mice (Tasic et al., 2011), humans (Zhu et al., 
2014), and pigs (Ruan et al., 2015), the region 
between the DRG/ and EIF4ENIF1 genes has been 
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identified as a validated GSH locus to support the 
consistent expression of transgenes over time. The 
regions around the DRG/ and EIF4ENIF/ genes in 
the chicken genome are similar to the organisms 
mentioned above and include a gene-dense area. In 
the mouse, human, and pig genomes, the sfi/ and 
patzl genes are located upstream of EJF4ENIF1 and 
DRGI genes, respectively. Also, the direction of 
EIF4ENIF1 and DRGI genes are towards each other 
in these organisms. These data suggest that the gene 
organization around the predicted GSH locus in the 
chicken genome is identical with the validated GSH 
loci (Figure 1a). 


Finding possible similarities using the Water 
algorithm of pairwise alignment 

To designate the orthologous GSH locus in the 
chicken genome, we first located the DRG/ and 
EIF4ENIF]_ genes in the chicken genome. Local 
pairwise alignment (Water algorithm) describes the 
most similar region(s) within the sequences to be 
aligned. It was performed to find the possible 
similarity of known GSH intergenic locus in the 
human (GRCh38.P13), mouse (GRCm39), and pig 
(Sscrofal1.1) genomes with the same locus in 
chicken. To this end, the following regions, all from 
the intergenic locus between the DRG/ and 
EIF4ENIF] — genes, were selected and pairwise 
aligned to an 1100-bp region of the same locus in the 
chicken genome (NC_052546.1; Ch.15 from 
9286985..9286084): a 195-bp region from the 
human genome (NC_000022.11; Ch.22 from 
31434452 to 31434648), a 5318-bp region from the 
mouse genome (NC_000077.7; Ch.11 from 3194588 
to 3199907), and a 3731-bp region from the pig 
genome (NC_010456.5; Ch.14 from 48153103 to 
48156835). The results showed that the intergenic 
locus between the DRG/ and EIF4ENIF1 genes in 
the chicken genome has similarity scores of 37.5%, 
40.9%, and 39.2% with the corresponding regions 
from the human, mouse, and pig genomes, 
respectively (Figure 1b). 


Searching for possible coding and non-coding 
genes in the predicted GSH locus 

The presence of any annotated coding and non- 
coding genes in the intergenic region between the 
chicken DRG/ and EIF4ENIF1 genes was evaluated 
using the UCSC Genome Browser (chicken 
assembly; Mar. 2018 GRCg6a/galGal6), and the 
NCBI Genome Data Viewer (GRCg6a). The results 
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showed no coding/non-coding genes at a distance 
between these two genes (Figure Ic). 


Evaluating the transcriptional status of genes 
adjacent to the predicted GSH locus 

It has been demonstrated that the transgene can 
be expressed reliably in the actively transcribed 
regions of the genome. The transcriptional status of 
DRGI and EIF4ENIFI1 genes was evaluated using 
the RNAseq data adopted from the Gene Expression 
Atlas. RNAseq data showed that the DRG/ gene is 
actively transcribed in several tissues. In contrast, 
the transcription of the EJF4ENIF1 gene is variable 
among tissues. Hence, here the insertion site of the 
transgene of interest was designated near the DRG/ 
gene (Figure 1d). 


Designating the topological location of the 
predicted GSH locus using the TAD data 

Several studies have confirmed that gene clusters 
located in a given TAD are regulated similarly 
(Hilliard and Lee, 2021). Transgenes inserted into a 
TAD containing actively transcribed genes maintain 
their transcriptional activity. Benefiting from the Hi- 
C data of chicken embryonic fibroblasts (Fishman et 
al., 2019), we located the topological position of the 
DRGI and EIF4ENIF/ genes related to the adjacent 
TAD. It was surprising that the transgene insertion 
site was located in an individual active TAD near the 
DRGI gene (Figure le). 


Constructing gRNA expression and 
CRISPR/Cas9-based targeting vectors 

Synthetic gRNA oligonucleotides specific to the 
predicted GSH locus were cloned into the Cas9 
expression vector and confirmed with Pl and P7 
primers (Table 1). Five clones of transformants were 
checked with colony PCR. To construct the 
CRISPR/Cas9-based targeting vector, the isogenic 
544-bp right homology arm was amplified using P5 
and P6 primers (Table 1). Then, it was cloned into 
the pre-targeting vector. The pre-targeting vector 
was cut with NheI and Xcml, and the 516-bp and 
7469-bp bands were detected on the agarose gel. 
Then, the isogenic 541-bp left homology arm was 
amplified using P3 and P4 primers (Table 1), 
confirmed and cloned into the pre-targeting vector. 
For verification of LHA cloning, a CRISPR-based 
targeting vector was cut with Pvul and Xhol, and 
503-bp and 7489-bp bands were detected on the 
agarose gel. 
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Primer Sequence (5' to 3’) Length (bp) 
Pl CACCTCCAGTCACTAACAAAGTAC 20-mer 
P2 AAACGTACTTTGTTAGTGACTGGA 20-mer 
P3 CATGCATTAGTTCGCGATCGAGCCCTAGGGGAGGTCCTG 39-mer 
P4 TGGCGACCGGTACCCTCGAGAAGAATTTCCTGCTTATTTGACTTCTCC 48-mer 
P5 CTTTCTAGGGTTAAGCTAGCCTTCCACTAGTATAAACAATTG 42-mer 
P6 TGGTGCCACCTATGTTGTGGAGAAATAAAACTGCTCTCCC 40-mer 
P7 CGGGCCATTTACCGTAAG 18-mer 
Discussion expressed genes could be a potential GSH candidate 


One of the most important applications for the 
identification of GSH loci is to use these regions for 
insertion of transgenes and generating transgenic 
animals with the ability to be used as bioproduction 
systems (Li et al., 2019; Ruan et al., 2015). It is 
possible to achieve a consistent and reliable 
expression of the transgene by stable chromosomal 
insertion of the exogenous DNA at a GSH locus 
(Shin et al., 2020). Since the generation of transgenic 
animals is expensive and time-consuming, the 
prediction of the potential GSH loci can be helpful, 
preventing a potential transgene-silencing over 
generations. 

Traditionally, random integration of a given 
transgene followed by the screening of the transgene 
expression across the genome has been used to find 
the highly expressed regions (Zambrowicz et al., 
1997). This method is accompanied by cumbersome 
and time-consuming steps, including screening of 
the integrated transgenes, analyzing their expression 
levels, and identification of reliable GSH regions. 
Moreover, in the random integration approach, lots 
of regions, including intergenic and intragenic 
regions, are targeted and screened, and some 
intragenic regions could be selected as potential 
GSH regions. In some previous studies, new GSH 
regions were identified by genome-wide 
comprehensive analyses (Ma et al., 2018) or using 
available bioinformatics data to search for potential 
GSH regions (Yang et al., 2016b; Irion et al., 2007; 
Kobayashi et al., 2012; Lee et al., 2019; Li et al., 
2014; Liu et al., 2018; Rizzi et al., 2017; Ruan et al., 
2015; Stanford et al., 2001; Tasic et al., 2011; Wu et 
al., 2016; Yang et al., 2016a; Zhu et al., 2014). It has 
been discussed that the utilization of intragenic 
regions to express a transgene may lead to transgene 
silencing, disruption of endogenous genes, or even 
inducing oncogenes (Oleg E. Tolmachov et al., 
2013). 

Whole transcriptome analysis has been 
performed to widely analyze gene expression levels 
in arange of organisms and tissues (Ma et al., 2018). 
Thus, an intergenic region between two highly 
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(Tasic et al., 2011). Nowadays, with the advent of 
CRISPR/Cas technology, a transgene of interest can 
be precisely integrated into the candidate GSH 
region (Kimura et al., 2014). It has been 
demonstrated that the actively expressed gene-rich 
regions may support a reliable, consistent, and long- 
term expression of the transgene. For example, the 
intergenic region between the DRGI/ and 
EIF4ENIF1 genes has been known as a GHS locus 
in several animals, including mice, humans, and 
pigs. It has been revealed that both DRGI/ and 
EIF4ENIF1 genes have broad spatial and temporal 
EST (expression sequence tag) expression patterns 
(Hippenmeyer et al., 2010) and could reliably 
support transgene expression (Pryzhkova et al., 
2020). 

The DNA sequences between the DRG/ and 
EIF4ENIF1 genes in mice were compared with 
those from the same locus in humans to determine 
the level of sequence identity. Results showed that 
there was a 45% similarity (Zhu et al., 2014). 
Functional validation verified that the region was 
suitable as a safe location for the placement of 
transgenes in human cells. In another study, the 
prediction of the GSH locus was accomplished by 
the assessment of the similarity of adjacent genes 
and their intron/exon organization. For example, the 
intergenic sequence between the pig DRG/ and 
EIF4ENIF1 genes was predicted to be a GSH due to 
its similarity to the adjacent genes of mice (Ruan et 
al., 2015). Although the GSH locus between these 
two genes was successfully predicted in the human 
and pig genomes based on a similarity search, there 
are two caveats in these studies: i) the levels of 
expression of these two genes were not evaluated, 
and i) the coordination of the DRGI/ and 
EIF4ENIF1 genes related to the locations of TAD 
boundaries was not investigated. 

It has been demonstrated that transcriptionally 
permissive chromatin structures can support the 
consistent and reliable expression of a transgene. For 
example, 10.9% of the Chinese hamster ovary 
(CHO) cell genome contains actively transcribed 3D 
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chromatin structures, leading to the stability of 
transgene expression during the cell development 
process (Hilliard and Lee, 2021). Therefore, the 
transcriptionally permissive 3D chromatin structures 
could be easily predicted using RNAseq and Hi-C 
data. Also, analyzing of NucMap data will make 
better predictions of GSH regions in the future (Zhao 
et al., 2019). 

Here, we used a multi-faceted approach to predict 
a GSH region in the chicken genome based on 
similarity search, RNAseq data, and Hi-C data. The 
easy-to-use and fast pipeline for the prediction of 
GSH regions before generating the transgenic 
animals can facilitate industrial research and 
development procedures. In this study, for the first 
time, we introduce the GSH locus located between 
the DRG/ and EIF4ENIFI_ genes in the chicken 
genome. It has been demonstrated that intergenic 
regions show less nucleosome 
occupancy than intragenic regions (Voong et al., 
2016). So, it is less subjected to silencing. On the 
other hand, Tasic and colleagues showed that 
molecular integration tools could have better access 
to intergenic regions in comparison to intragenic 
regions in the mouse embryonic stem cells (Tasic et 
al., 2011). Also, consistent expression of the 
desirable transgene without silencing for over 30 
passages has been reported (Zhu et al., 2014). Hence, 
the GSH locus located between the DRG/ and 
EIF4ENIF] — genes can be reliably applied to 
generate transgenic animals. 
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